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Abstract 



The aim of this paper is to investigate extremum problems with pay-off being the total variational 
distance metric defined on the space of probability measures, subject to linear functional constraints 
on the space of probability measures, and vice-versa; that is, with the roles of total variational metric 
and linear functional interchanged. Utilizing concepts from signed measures, the extremum probability 
measures of such problems are obtained in closed form, by identifying the partition of the support 
set and the mass of these extremum measures on the partition. The results are derived for abstract 
£> . spaces; specifically, complete separable metric spaces known as Polish spaces, while the high level 

m 

\Q • ideas are also discussed for denumerable spaces endowed with the discrete topology. These extremum 



problems often arise in many areas, such as, approximating a family of probability distributions by a 
given probability distribution, maximizing or minimizing entropy subject to total variational distance 

O 

. metric constraints, quantifying uncertainty of probability distributions by total variational distance metric, 

stochastic minimax control, and in many problems of information, decision theory, and minimax theory. 
Keywords: Total variational distance, extremum probability measures, signed measures. 



I. Introduction 

Total variational distance metric on the space of probability measures is a fundamental quantity 
in statistics and probability, which over the years appeared in many diverse applications. In 
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information theory it is used to define strong typicality and asymptotic equipartition of sequences 
generated by sampling from a given distribution [1]. In decision problems, it arises naturally 
when discriminating the results of observation of two statistical hypotheses [1]. In studying 
the ergodicity of Markov Chains, it is used to define the Dobrushin coefficient and establish 
the contraction property of transition probability distributions [2]. Moreover, distance in total 
variation of probability measures is related via upper and lower bounds to an anthology of 
distances and distance metrics [3]. The measure of distance in total variation of probability 
measures is a strong form of closeness of probability measures, and, convergence with respect 
to total variation of probability measures implies their convergence with respect to other distances 
and distance metrics. 

In this paper, we formulate and solve several extremum problems involving the total variational 
distance metric and we discuss their applications. The main problems investigated are the 
following. 

(a) Extremum problems of linear functionals on the space of measures subject to a total 
variational distance metric constraint defined on the space of measures. 

(b) Extremum problems of total variational distance metric on the space of measures subject 
to linear functionals on the space of measures. 

(c) Applications of these extremum problems, and their relations to other problems. 

The formulation of these extremum problems, their discussion in terms of applications, and the 
contributions of this paper are developed at the abstract level, in which systems are represented 
by probability distributions on abstract spaces (complete separable metric space, known as Polish 
spaces [4]), pay-offs are represented by linear functionals on the space of probability measures or 
by distance in variation of probability measures, and constraints by linear functionals or distance 
in variation of probability measures. We consider Polish spaces since they are general enough 
to handle various models of practical interest. 

Utilizing concepts from signed measures, closed form expressions of the probability measures 
are derived which achieve the extremum of these problems. The construction of the extremum 
measures involves the identification of the partition of their support set, and their mass defined on 
these partitions. Throughout the derivations we make extensive use of lower and upper bounds 
of pay-offs which are achievable. Several simulations are carried out to illustrate the different 
features of the extremum solution of the various problems. An interesting observation concerning 
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one of the extremum problems is its equivalent formulation as an extremum problem involving 
the oscillator semi-norm of the pay-off functional. The formulation and results obtained for 
these problems at the abstract level are discussed throughout the paper in the context of various 
applications, often assuming denumerable spaces endowed with the discrete topology. Some 
specific envisioned applications of the theory developed are listed below. 

(i) Dynamic Programming Under Uncertainty, to deal with uncertainty of transition proba- 
bility distributions, via minimax theory, with total variational distance metric uncertainty 
constraints to codify the impact of incorrect distribution models on performance of the 
optimal strategies [5]. This formulation is applicable to Markov decision problems subject 
to uncertainty. 

(ii) Approximation of Probability Distributions with Total Variational Distance Metric, to ap- 
proximate a given probability distribution /i on a measurable space (E,<S^(E)) by another 
distribution v on (E, ^(E)), via minimization of the total variational distance metric between 
them subject to linear functional constraints. Model and graph reduction can be handled via 
such approximations. 

(iii) Maximization or Minimization of Entropy Subject to Total Variational Distance Metric 
Constraints, to invoke insufficient reasoning based on maximizing the entropy H(v) of an 
unknown probability distribution v on denumerable space E subject to a constraint on the 
total variational distance metric. 

The rest of the paper is organized as follows. In section II, total variational distance is defined, 
the extremum problems are introduced, while several related problems are discussed together 
with their applications. In section III, some of the properties of the problems are discussed. In 
section III-A, signed measures are utilized to convert the extremum problems into equivalent 
ones, and to characterize the extremum measures on abstract spaces. In section IV, closed form 
expressions of the extremum measures are derived for finite alphabet spaces. In section V, the 
relation between total variational distance and other distance metrics is discussed. Finally, in 
section VI several examples are worked out to illustrate how the optimal solution of extremum 
problems behaves by examining different scenarios concerning the partition of the space E. 
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II. Extremum Problems 

In this section, we will introduce the extremum problems we shall investigate. Let (L,<is) 
denote a complete, separable metric space and (E,^(E)) the corresponding measurable space, 
where 3§(L) is the c-algebra generated by open sets in E. Let ^#1 (E) denote the set of probability 
measures on ^(E). The total variational distance 1 is a metric [6] drv '■ ^i(E) x ^#i(E) — > [0,°o) 
defined by 

d T v(a,P) = \\a-P\\Tv= sup £ |a(F f ) -JS(F,)| , (1) 

Pe&(Z)FieP 

where a,j8 6 ^#i(E) and <^(E) denotes the collection of all finite partitions of E. With respect 
to this metric, (J%\{T),dTv) is a complete metric space. Since the elements of ^i(E) are 
probability measures, then dTv((X,P) < 2. In minimax problems one can introduce an uncertainty 
set based on distance in variation as follows. Suppose the probability measure v G ^i(E) is 
unknown, while modeling techniques give access to a nominal probability measure \x G ^i(E). 
Having constructed the nominal probability measure, one may construct from empirical data, the 
distance of the two measures with respect to the total variational distance ||v — |l||rv- This will 
provide an estimate of the radius R, such that ||v — jLi||rv < and hence characterize the set of 
all possible true measures v G ^i(E), centered at the nominal distribution /i G ./#i(E), and lying 
within the ball of radius R, with respect to the total variational distance 1 1 • | \rv- Such a procedure 
is used in information theory to define strong typicality of sequences. Unlike other distances 
used in the past such as relative entropy [7]-[ll], quantifying uncertainty via the metric || ■ \ \jy 
does not require absolute continuity of measures 2 , i.e., singular measures are admissible, and 
hence v and /i need not be defined on the same space. Thus, the support set of /i may be E C E, 
hence /i(E\E) = but v(E\E) ^ is allowed. For measures induced by stochastic differential 
equations (SDE's), variational distance uncertainty set models situations in which both the drift 
and diffusion coefficient of SDE's are unknown. 

'The definition of total variation distance can be extended to signed measures. 

2 V £ (S) is absolutely continuous with respect to e (£), denoted by v << \x, if ll(A) = for some A e then 
V(A) = 0. 
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Define the spaces 

BC(L) — {£ : E h-> IR : £ are bounded continuous} , 

BM(L) = {£ : E n- M : £ are bounded measurable functions} , 

5C + (E) = {flC(E) : i > 0} , 5M+(E) = {BM(Z) :£>0}. 

5C(E) and 5M(E) endowed with the sup norm | \£\ \ = sup Te £ |^(x)|, are Banach spaces [6]. Next, 
we introduce the two main extremum problems we shall investigate in this paper. 

Problem 11.1. Given a fixed nominal distribution /i G ^#i(E) and a parameter R G [0,2], define 
the class of true distributions by 

Btf(Ai) = {vG^fi(E): \\v-h\\tv<r}, (2) 

and the average pay-off with respect to the true probability measure v G Mr(/i) by 

Li(v) = J £(x)v(dx), £GflC+(E) or BM + (L). (3) 

The objective is to find the extremum of the pay-off 

D + (R) = sup [ £(x)v(dx). (4) 

Problem II. 1 is a convex optimization problem on the space of probability measures. Note that, 
5C+(E), 5M+(E) can be generalized to L°°' + (E,^'(E), v), the set of all @(L) -measurable, non- 
negative essentially bounded functions defined v — a.e. endowed with the essential supremum 
norm | |^[[oo, v = v-ess sup xeE £(x) =inf Ae ^ sup xeA r |K(x)||, where JV v = {Ae ^(E) : v(A) = 0}. 

In the context of minimax theory, Problem II. 1 is important in uncertain stochastic control, 
estimation, and decision, formulated via minimax optimization. Such formulations are found in 
[7]-[ll] utilizing relative entropy uncertainty, and in [12], [13] utilizing L\ distance uncertainty. 
In the context of dynamic programming this is discussed in [14]. The second extremum problem 
is defined below. 

Problem II.2. Given a fixed nominal distribution /i G (E) and a parameter D G [0, °°), define 
the class of true distributions by 

Q(D) = {v G ^fi(E) : J £(x)v(dx) < D j, £ G 5C+(E) or BM + (Z), (5) 
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and the total variation pay-off with respect to the true probability measure V G Q(D) by 

L 2 (v) = \\v-h\\tv- (6) 
The objective is to find the extremum of the pay-off 

R~(D)= inf ||v-/i||rv, (7) 

veQ(D) 

whenever 3 f L £(x)n(dx) > D. 

Problem II. 2 is important in the context of approximation theory, since distance in variation is a 
measure of proximity of two probability distributions subject to constraints. It is also important in 
spectral measure or density approximation as follows. Recall that a function {R(t) : — 00 < T < °°} 
is the covariance function of a quadratic mean continuous and wide-sense stationary process if 
and only if it is of the form [15] 

/oo 
e 2KVT F{dv), 
-oo 

where F(-) is a finite Borel measure on R, called spectral measure. Thus, by proper normalization 
of F(-) via F N (dv) = -^F(dv), then F N (dv) is a probability measure on 3§(1SL), and hence 
Problem II.2 can be used to approximate the class of spectral measures which satisfy moment 
estimates. Spectral estimation problems are discussed extensively in [16]-[20], utilizing relative 
entropy and Hellinger distances. However, in these references, the approximated spectral density 
is absolutely continuous with respect to the nominal spectral density; hence, it can not deal 
with reduced order approximation. In this respect, distance in total variation between spectral 
measures is very attractive. 

A. Related Extremum Problems 

Problems HI, II. 2 are related to additional extremum problems which are introduced below. 
(1) The solution of (4) gives the solution to the problem defined by 

R + (D)= sup \\v-h\\tv- (8) 

ve^Ti (Zy.fcl(x)v(dx)<D 

Specifically, R + (D) is the inverse mapping of D + (R). D + (R) is investigated in [21] in 
the context of minimax stochastic control under uncertainty, following an alternative ap- 
proach which utilizes large deviation theory to express the extremum measure by a convex 

3 If f Ii £(x)fl(dx) < D then v* = /i is the trivial extremum measure of (7). 
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combination of a tilted and the nominal probability measures. The two disadvantages of 
the method pursued in [8]-[ll] are the following. 1) No explicit closed form expression 
for the extremum measure is given, and as a consequence, 2) its application to dynamic 
programming is restricted to a class of uncertain probability measures which are absolutely 
continuous with respect to the nominal measure /l(E) G 

(2) Let v and /i be absolutely continuous with respect to the Lebesgue measure so that <p(x) = 
jnx(x), V(x) = ^ (x) (e.g., <?>(•), !//(•) are the probability density functions of v(-) and /i(-), 
respectively. Then, ||v — h\\tv = Ji,\ ( P(x) — y{x)\dx and hence, (4) and (8) are L\ -distance 
optimization problems. 

(3) Let E be a non-empty denumerable set endowed with the discrete topology including 
finite cardinality |E|, with ^i(E) identified with the standard probability simplex in W^, 
that is, the set of all |E| -dimensional vectors which are probability vectors, and i{x) = 
-logv(x),x G E, where {v(x) : x G E} G JK\{JL), {/i(x) : x G E} G ^i(E). Then (4) is 
equivalent to maximizing the entropy of {v(x) : x G E} subject to total variational distance 
metric constraint defined by 



Problem (9) is of interest when the concept of insufficient reasoning (e.g., Jayne's maximum 
entropy principle [22], [23]) is applied to construct a model for v G ^i(L), subject to 
information quantified via total variational distance metric between v and an empirical 
distribution /i. In the context of stochastic uncertain control systems, and its relation to 
robustness, Problem (9) with the total variational distance constraint replaced by relative 
entropy distance constraint is investigated in [24], [25]. 
(4) The solution of (7) gives the solution to the problem defined by 



Problems (7) and (10) are important in approximating a class of probability distributions or 
spectral measures by reduced ones. In fact, the solution of (10) is obtained precisely as that 
of Problem III, with a reverse computation of the partition of the space E and the mass of 
the extremum measure on the partition moving in the opposite direction. 



D + (R) 



sup 

ve^i(l):E, 6E |v(*)-;uMI<tf 



H(v). 



(9) 




(10) 
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III. Characterization of Extremum Measures on Abstract Spaces 

This section utilizes signed measures and some of their properties to convert Problems II. 1, II.2 
into equivalent extremum problems. First, we discuss some of the properties of these extremum 
Problems. 

Lemma III.l. 

(1) D + (R) is a non-decreasing concave function of R, and 

D + (R)= sup I £{x)v{dx), if 7?<i? max , (11) 
\\v-ii\\tv=R J ' l 

where R mSLX is the smallest non-negative number belonging to [0,2] such that D + (R) is 
constant in [i? max ,2]. 

(2) R~(D) is a non-increasing convex function of D, and 

R '-(D) = inf \\v-h\\tv, if D<D max , (12) 

J z e(x)v(dx)=D 

where D max is the smallest non-negative number belonging to [0, °°) such that R (D) = 
for any D G [Anax, 00 )- 

Proof: (1) Suppose < Ri < R2, then for every v G M Ri (/l) we have | |v — h\\tv <Ri < #2, 
and therefore v G Mr 2 (h), hence 

sup / £(x)v(dx) < sup / £(x)v(dx), 
v€B Rl (n) J Z veB/f 2 (/x)^ 

which is equivalent to D + (R\) < D + (R 2 ). So D + (R) is a non-decreasing function of R. Now 

consider two points (R\,D + (Ri)) and (R2,D + (R 2 )) on the linear functional curve, such that 

Vi G B^j(jii) achieves the supremum of (4) for Ri, and v 2 G B^ 2 (/i) achieves the supremum of 

(4) for R 2 . Then, ||vi — /i||ry < Ri and || V 2 — {J-\\tv < ^2- F° r any A G (0, 1), we have 

||Xvi + (l-X)v 2 -jU||rv < A||vi-ju||rv + (l-A)||v 2 -/i||rv < + (1 -X)R 2 =R. 

Define v* = Avi + (1 — A) v 2 , R = Ai?i + (1 — A)/? 2 - The previous equation implies that v* G 
BtfGu), hence D+(Atfi + (1 - X)R 2 ) > f L £(x)v*(dx). Therefore, 

D+(R)= sup [ £(x)v(dx)> [ £(x)v*(dx)= [ £(x)(Xv l (dx) + (l-X)v 2 (dx)) 

= X [ £(x)Vi(dx) + (l-X) I £(x)v 2 (dx) = XD+(R l ) + (l-X)D + (R 2 ). 
Jz. Jz 
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So, D + (R) is a concave function of R. Also the right side of (11), say D + (R), is concave function 
of R. But D + (R) = sup R i <R D + (R') which completes the derivation of (11). 
(2) Suppose < Di < D 2 , then Q(D{) C Q(D 2 ), and inf veQ(Z)l) ||v - h\\tv > mf ve Q(z> 2 ) l|v — 
H\\tv which is equivalent to R~(D\) > R~(D 2 ). Hence, R (D) is a non-increasing function of 
D. Now consider two points (D\,R (D\)) and (D 2 ,R (D 2 )) on the total variation curve. Let 
D = XD l + (l-X)D 2 , v*=Avi + (l-A)v 2 and ViGQ(Di), v 2 gQ(D 2 ) such that ||vi-ji||rv = 
R~(Di) and ||v 2 -ju||ry = R~(D 2 ). Then, f z £(x)v\(dx) <D l and f z £(x)v 2 (dx) <D 2 . Taking 
convex combination leads to 

X j£(x)Vi(dx) + (1 - X) J^£(x)v 2 (dx) < XD X + (1 - A)D 2 = £>, 

and hence v* G Q(D). So, 

R~(D)= inf ||v-jU||ry < ||v*-ju||rv = ||Avi + (l-X)v2-jU||rv 



< A||vi — /i||ry + (1 — A)||v 2 — /i||ry = Ai?~(Di) + (1 -X)RT{D 2 ). 

This shows that R (D) is convex function of D. Also the right side of (12), say R~(D), is convex 
function of D. But, R~(D) = inf )y <D R~(D') which completes the derivation of (12). ■ 
Let ^# 5m (£) denote the set of finite signed measures. Then, any r\ G ^# im (E) has a Jordan 
decomposition [26] {r] + , T] } such that 77 = T] + — T] _ , and the total variation of r\ is defined by 
||i7|| rv = T7+(L) + T]-(L). Define the following subset M (£) = |rj G ^ sm (Z) : 77(E) = o|. For 
% G Mo(L), then = 0, which implies that = §"(E), and hence = = 

ifc. Then, % = v-{XEM (Z) and hence ^ = (v - - (v - /i)- = ^+ - ^ . 

A. Equivalent Extremum Problem of D + (R) 

Consider the pay-off of Problem HI, for £ G BC + (L). Then the following inequalities hold. 
D+(R) = fj(x)v(dx) = j l{x) (£ + (dx)-%-(dx))+ J i{x)ii{dx) 

< sup£(x)£ + (Z)-M£(x)£-CL)+ / £(x)^i(dx) 

= sup£(x)^^-M£(x)^^ + I £(x)^i(dx) 
xfe 2 xez 2 Jz 



(supl(jc)-infl(s)l ^} TV + I £(x)n(dx), (13) 
Uez J 2 Jz 
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where (a) follows by adding and subtracting / £d\l, and from the Jordan decomposition of 
(v-jlt), (b) follows due to £ G BC+(L), (c) follows because any % G M (E) satisfies £ + = 
^{H) = \\\^\\tv- For a given /i G JZ\{Y) and v G B#(/i) define the set 

BtfGu) = G M (E) :§ = v-/t,ve ^i(E),||£||jv 

The upper bound in the right hand side of (13) is achieved by £,* G B^(jLt) as follows. Let 

x° G E° = {x G E : £(x) = sup{^(x) : x G E} = M} , 

Xo g E = {x G E : £(x) = mf{£(x) : x G E} = m} . 

Take 

£*(dx) = v*(dx) - ju(Jx) = - (<5 x o(<ix) - <5 X(J (dx)) , (14) 

where 5y(Jx) denotes the Dirac measure concentrated at y G E. This is indeed a signed measure 
with total variation ||£*||rv = ||v* — /l||ry = and J z £(x)(v* - /J.)(dx) = ^(M-m). Hence, 
by using (14) as a candidate of the maximizing distribution then the extremum Problem II. 1 is 
equivalent to 

D + (R)= [ £(x)v*(dx) = - \sup£(x)-M£(x)\+En(£), (15) 
Jl 2 { xeL xez J 

where v* satisfies the constraint ||^*||rv = ||v* — H\\tv = R> it is normalized v*(E) = 1, and 
< v*(A) < 1 on any A G <^(E). Alternatively, the pay-off f Ii £(x)v* (dx) can be written as 

/ £(x)v*(dx)= ! Mv*(dx)+ [ mv*(dx) + [ e(x)n(dx). (16) 
Jz JiP Jl Jt\iPut® 

Hence, the optimal distribution v* G B#(/i) satisfies 

/ v*(Jx)= M (E°) + ? G[0,1], / V*(Jx) = /i(Eo)-f G[0,1], 
JiP 2 Jt, 2 

v*(A)=n(A), VACE\E°UE . (17) 

Remark III.2. 

(1) For /i G ^#i(E) which do not include point mass, and for f G BC + (L), iflP and Eo are 
countable, then (17) is /i(E°) = fi(Z ) = 0, v*(E ) = 0, v*(E°) = §, v*(E\E°UE ) = 
Ai(E\E°UE )-f. 

(2) The first right side term in (15) is related to the oscillator seminorm of f G 5M(E) called 
global modulus of continuity, defined by osc(f) = sup^)^^ \ f(x) — f(y) \ = 2inf a£ R | \f — 
a\\.Forfe BM + (L), osc(/) = sup A . eE |/(x) | - inf re£ \f(x) |. 
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B. Equivalent Extremum Problem of R (D) 

Next, we proceed with the abstract formulation of Problem H.2. Consider the constraint of 
Problem II.2, for £ E 5C + (E). Then the following inequalities hold. 

J^£(x)v(dx) = J^£(x) (% + {dx) -$~(dx)) +J £(x)n(dx) 

> inf£(x)^+(E)-sup£(jc)^(E)+ I £(x)n(dx) 

= inf£( x )MJJl^_ sup £( I )MJil^ + f £(x)n(dx) 
xeZ 2 Je6 £ 2 Jz 

= { inf - supl(x)l ^\ TV + f £(x)ii(dx). (18) 

The lower bound on the right hand side of (18) is achieved by choosing £,* eMr(ii) as follows 

^(dx) = v*(dx)-n(dx) = - (8 X0 (dx) - 8 x o(dx)) . (19) 

This is a signed measure with total variation ||^*||7y = ||v* — h\\tv =R. Hence, by using (19) 
as a candidate of the minimizing distribution then (18) is equivalent to 

/ £(x)v*(dx) = ^\mf£(x)-sup£(x)\+Ea(£). (20) 
Jz. 2 {xei. xe z J 

Solving the above equation with respect to R the extremum Problem 11.2 (for D < En{£)) is 

equivalent to 

, s 2(D-EM)) 
R(D) = - ^ MJJ v (21) 

<^ inf£(jc)-sup^(x) \ 

where v* satisfies the constraint j z £(x)v*(dx) =D, it is normalized v*(E) = 1, and < v(A) < 1 
on any A E £$(L). We can now identify R max and D max described in Lemma III. 1 . These are 
stated as a corollary. 

Corollary III.3. The values of R max and D max described in Lemma III.l are given by 
tfmax = 2(l- i u(E )) and D max = j £(x)n(dx). 

Proof: Concerning R max , we know that D + (R) < sup xeL £(x), Vi? > 0, hence D + (R max ) can 
be at most sup xeL £(x). Since D + (R) is non-decreasing then D + (R max ) <D + (R) < sup xe£ £(x), 
for any R > R max . Consider a v that achieves this supremum. Let /i(E°) and v(E°) to denote 
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the nominal and true probability measures on E°, respectively. If v(E°) = 1 then v(E\E°) =0. 
Therefore, 

l|v-/i||rv= £ \v(x)-n(x)\+ £ |v(x)-/i(x)| = £ \v(x)-n(x)\+ £ 1-juWI 
= £ v(x) - £ jU (x) + £ jU (x) = 1 - £ £i(x) + £ n(x) 

= 2(1-EmW) =2(1-m(E )), 
V / 

where (a) follows due to v(E\E°) = which implies v(x) =0 for any x £ E\E°, and (b) follows 
because v(x) > ju(x) for all x £ E°. Therefore, R max = 2(1 -jU(E )) implies that D + (R max ) = 
sup xeL £(x). Hence, D + (R) = sup xeE £(x), for any R > R max - 

Concerning D max , we know that R~ (D) > for all D > hence R (Anax) can be at least zero. 
Let £) max = Jji'WjU^Jc), then it is obvious that R~(D max ) = 0. Since R (D) in non-increasing, 
then < R (D) < R~(D max ), for any D > D max . Hence, R (D) = 0, for any D > D max . ■ 

IV. Characterization of Extremum Measures for Finite Alphabets 

This section uses the results of Section III to compute closed form expressions for the 
extremum measures v* for any R £ [0,2], when E is a finite alphabet space to give the intuition 
into the solution procedure. This is done by identifying the sets E°, Eo, E \ E° U Eo, and the 
measure v* on these sets for any R £ [0,2]. Although this can be done for probability measures 
on complete separable metric spaces (Polish spaces) (L,dz), and for £ £ BM + (L), £ £ 5C + (E), 
L°°' + (E,i^(E), v), we prefer to discuss the finite alphabet case to gain additional insight into 
these problems. At the end of this section we shall use the finite alphabet case to discuss the 
extensions to countable alphabet and to £ £ L°°' + (E,^(E), v). 

Consider the finite alphabet case (E,^#), where card(E) = |E| is finite, ^ = 2' Z L Thus, v 
and /i are point mass distributions on E. Define the set of probability vectors on E by 

F(L) = {p=(p h ...,p in ):p i >0,i = 0,...,\L\, y £p i = l\. (22) 

Thus, p £ P(E) is a probability vector in K^. Also let I = {£ h . . .,£^} so that £ £ M^ 1 (e.g., 
set of non-negative vectors of dimension |E|). 
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A. Problem II. 1 : Finite Alphabet Case 

Suppose v G P(E) is the true probability vector and /i G P(E) is the nominal fixed probability 
vector. The extremum problem is defined by 

D+(R)= max £^v f , (23) 

where 

B R (n) = {ve P(E)) : [|V — /xllrv = £ |v,— /x,-| </?}. (24) 

z'eE 

Next, we apply the results of Section III to characterize the optimal v* for any R G [0,2]. By 
defining, 4 = v, — /!,-, i = 1 , . . . , |E| and £, G Mo(E), Problem II. 1 can be reformulated as follows. 

max YiiVi — >Y£iHi+ max Yl&. (25) 

Note that £ G B^(jLt) is described by the constraints 

£|6|<tf, £4 = 0, <& + /!,• <1, Vi G E. (26) 
The positive and negative variation of the signed measure t, are defined by 

& if6>0 ^-aI ' if ^-° 
0, if ^- < 0, ]-&, if ^ < 0, 



Therefore, 



£4=£4 + -£r, El&l = E5 + +E$r 

ies z'e£ ieE ies 



and hence, 



£sr'' 2 ~2' 2 ~2 : 



and 



In addition, 



£4 = 0, a = £|&|<*. (27) 



£^-=£^ + -£^r- (28) 

iez ieZ iel. 



Define the maximum and minimum values of the sequence {£\, . . . ,£m} by £ max = max ;G £^, 
4nin — miri/ e x4', and its corresponding support sets by E° = {i G E : li = £ max }, Eq = {i G E : 



January 22, 2013 



DRAFT 



14 

U = ^min}- For all remaining sequence, {It : i G E\E°UEo}, and for 1 < r < |E\E°UEo| define 
recursively 

E^|/GE:£ ; = min|^:aGE\E°U ^U E 7_i)}}' *6 {l,2,...,r}, (29) 

till all the elements of E are exhausted (i.e., k is at most |E\E°UEo|). Define the corresponding 
values of the sequence of sets in (29) by 

£(L k )± min i h k G {1,2, ... ,r}, 

ieAEPuaj^-i) 

where r is the number of E^ sets which is at most |E\E°UEo|; for example, when k = 1, 
£ (Ei) = nun ( - e £\ £0u£ ■£(. The following theorem characterizes the solution of Problem ILL 

Theorem IV. 1. The solution of the finite alphabet version of Problem II. 1 is given by 

D + (R) = £ ma xV*(E ) +£ min v*(Eo) + £ Wv*(EjO. (30) 

k=l 

Moreover, the optimal probabilities are given by 

v*(E°)4 £ v ;= + (31a) 

v*(Eo)^y>*=(y>-a) , (31b) 

ieio \/6io / 

v*(E,)^£vf=f£/ii-f«-i: L rt) ) , (31c) 

a = min ( ^,1 ) , (3 Id) 

where, k= l,2,...,r and r w £/ze number ofL k sets which is at most |E\E UEo|. 

Proof: The derivation of the Theorem is based on a sequence of Lemmas, Propositions and 
Corollaries which are presented below. ■ 
The following Lemma is a direct consequence of Section III-A. 

Lemma IV.2. Consider the finite alphabet version of Problem ILL Then the following bounds 
hold. 

1. Upper Bound. 

January 22, 2013 DRAFT 



15 



The upper bound holds with equality if 



E^ + <w(f). (32) 



E^ + y<l, £^ + = y, ^ + =0/or i G E\E° (33) 



and the optimal probability on E° is given by 



2. Lower Bound. 



V*(L )^ £v*=min(l, (34) 

E^">4dn(f). (35) 



77ze lower bound holds with equality if 

oc oc 

I>- 2 - ' E §r = 2 ' 5" = /or f G E \ Eo ' (36) 

anJ ?/ze optimal probability on Eq is given &y 

v*(zo)^E v r=fErt-T) • ^ 



Moreover, under the conditions in 1 and 2 the maximum pay-off is given by 

a 
2 



D+(i?) = -{£ max - £ mm } + E (38) 



Proof: Follows from Section III-A. ■ 

Proposition IV.3. 7/I ;eZ o w + § = 1 tfien £>+(#) = W- 

Proof: Under the stated condition £, eE o V* = 1 and therefore E, e z\zo V* = 0, hence v ; * = 0, 
for all i G E\L . Then the maximum pay-off (23) is given by 

D+(R) = E 4v,* + E W = £ max E V * = W 

■ 

The lower bound of Lemma IV.2 characterize the extremum solution for L/ez — # > 0- Next, 
the characterization of extremum solution is discussed when this condition is violated. 
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Lemma IV.4. If E, e s fit - § < 0, then 

2>£r > (| - £ w ) + l mn £ w . 

Moreover, equality holds if 

E £f = E 
E$r= (f-Ew 

£ M/+ £ mi > f , 

=0 for all zGE\E UEi, 
and the optimal probability on X4 is given by 

£ v* = ( X> - (f- 1^ 

Proof: First, we show that inequality holds. 

£ ^r>.min£ ; £ r='£i) I r=^i)fir-I^r 
E^--E^->^i)(?-EwY 



Hence, 



which implies 



J>r > £(eo ( I - £ tt ) +e mm £ 

ieE V z ieEo / ieE 



establishing (39). Next, we show under the stated conditions that equality holds. 



, £ fli + m) £ r =^nn„ £ ( f - £ ft) 

ieZo ''eEi i'gEo \ / 



From (40b) we have that 



E6 = (f-Ert)> 
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and hence, 

The optimal £ j6£j v,- must satisfy f - £ i6£(J ^ > and £ ieE] ^ + £ ieZo ju f - - f > 0. Hence, (41) 
is obtained. ■ 

-jfc 



Following the previous Lemma, which characterizes the extremum solution when Eiez Mi ~~ t 



0, one can also characterize the optimum solution of extremum Problem ELI, when l)=\ LieE_i Mi 
f < 0, for any G {1,2, . . . ,r}. 

Corollary IV.5. For any k G {1 , 2, . . . , r}, if £* =1 £ /eE ._, jUj - f < tfien 

E^- E I I ^- (44) 

ieE V z ;=lieE;_i / j=li€Zj-i 



Moreover, equality holds if 



E li = E ^ oraZZ 7=1,2,...,*, (45a) 

ieEj_i i'eEy-i 



Er=(f-i I ( 45b ) 

iGE* V z j=li€Zj-i / 

EEW> ~, (45c) 

£r =0 for all i G E\E UEi U . . . UE*, (45d) 
anJ optimal probability on E^ seta is g/ven &y 

Ev,*=(lft-(f-I E S) ■ (46) 

Proof: Consider any ke {1,2, ...,r}. First, we show that inequality holds. From lower 
bound we have that 



E l£T> fn t t £ % 
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Hence, 



which implies 



Ztttr>*P*)(j-t E A + t E 



Next, we show under the stated conditions that equality holds. 

E^r = E I ^r+E^r+ I 



a 



E^-o i r+^)ir=i e ^-+^) y-E e » 



From (45b) we have that 



and hence, 



E«r = (f-£ E «|. <«> 



Ev^E^-ff-E EH- < 48) 



The optimal E^vf must satisfy f - E/e£,_i J"/ > and £j =0 £ /eEj . /x,- - f > 0. Hence, 
(46) is obtained. ■ 
Putting together Lemma IV.2, Proposition IV.3, Lemma IV.4, and Corollary IV.5 we obtain the 
result of Theorem IV. 1. Notice that the solution of Problem II. 1 finds the partition of E into 
disjoint sets {E°,Eo,Ei, . . . ,Et}> where E = E°UEoUEi U. . .UE&, and the optimal measure V*(-) 
on these sets. 



B. Problem 11.2: Finite Alphabet Case 

Consider Problem II. 2, and follow the procedure utilized to derive the solution of Problem II. 1 
(e.g., Section IV- A). Let = v ; - — /!, = <^- + — <^ _ , be the signed measure decomposition of t, . We 
know that, £ i6 £& = and so, E, eS ^ + = LieZ^T- Also 

ElVi-/H|=El&| = E#- + E§f = «, E^ + = Er = y- (49) 
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The average constraint can be written as follows 

£ im = £ 4(6 + ft) = I 46 + 1 ^ = I ^ + - E + 1 im < D - (50) 

ieE iez ieE /ez ieE ieE iel. 

Define the maximum and minimum values of the sequence by £ max = max^ii, £ m { n = min ;e j;£,' 
and its corresponding support sets by E° = {z G E : £\ = £ m ax}, Eo = {/ G E : ^ = £ m i n }- For all 
remaining sequence, {£j : i G E\E°UEo}, and for 1 < r < |E\E°UEo| define recursively 

E*^ jieL:4 = maxji :aeE\IoU ^jj^jj}' *e {l,2,...,r}, (51) 

till all the elements of E are exhausted, and define the corresponding maximum value of I on 
the sequence on these sets by 

i(z k )^ max 4 /cG{l,2,...,r}, 

v y ieE\E u(U) = i^'- 1 ) 

where r is the number of E sets which is at most |E\E°UEo|. Clearly, £ (E 1 ) = Hiax ( - eS ^ E o uEo l\ 
and so on. Note the analogy between (51) and (29) for Problem ELI. The main theorem which 
characterizes the extremum solution of Problem II. 2 is given below. 

Theorem IV.6. The solution of the finite alphabet version of Problem 11.2 is given by 

7T(D) = £|v*-M/|, (52) 

where the value of R (D) is calculated as follows. 

(1) // 

u ( ££/**•+£ m*) + E E £ m ( E E E w ) + E E € ^ 

then 

2 (d - 1** Zik-t (z k ) EEw-EE ^ ) 

i?-(D) = -^ — ^'-f : J=kieZJ ) . (53) 

(2) If D> (£ mi n - ^max) E ;eZ ^ + L, eZ to 



2^-£^j 



*"(*>) = }._ £ ■ ( 54 ) 



"mm *-max 
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Moreover, the optimal probabilities are given by 

v*(Z )^ £v* = £/!; + «, (55a) 

v*(Z°)^£vf =(£/!;-«) , (55b) 

iel.° Kiel. J 

v*(L k ) 4 £ vf = ( £ /* - f a - £ £ J ) , (55c) 

a = min I —^-L, 1 - £ jU/J . (55d) 
where k= 1 , 2, . . . , r and r is the number of Z seta which is at most |Z \ Z U Zq|. 

Proof: For the derivation of the Theorem see Appendix A. ■ 

C. Solutions of Related Extremum Problems 

In Section II-A we discuss related extremum problems, whose solution can be obtained from 
those of Problem II. 1 and Problem II. 2. In this Section we give the solution of the finite alphabet 
version of the related extremum problems described by (8) and (10). 

Consider the finite alphabet version of (8), that is 

R + (D)= sup ||v-/i||rv- (56) 

The solution of (56) is obtained from the solution of Problem II. 1, by finding the inverse mapping 
or by following a similar procedure to the one utilized to derive Theorem IV.6. 

Theorem IV.7. The solution of the finite alphabet version of (56) is given by 

2?+(D) = £|vf-/i / |, (57) 

where the value of R + (D) is calculated as follows. 

^max I £ £ fH + £ Ik I + £ £ h\k <D< Cax I £ £ Mi+ £ Ik I + £ £ h\k 

\j=\ ieZj-i ieiP J j=kieY.j \j=0ieZj ielP J j=k+lieT,j 

then , 

2 (d-w I>-*(i*) £ £ tt-EEAw 
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(2) If D < (£ max - £ mm ) + £ LiVi then 



2 



* + 0P) = V , ^ 7 • (59) 

-max *min 



Moreover, the optimal probabilities are given by 



v*(L )^ £v*= J> + a, (60a) 

v*(£o)=Ivf=^/if-al , (60b) 

''es \/6lo / 



+ 



v*(E fc ) 4 £ v * = ( £ M ._ ( a - t E Aii | , (60c) 
a = mini ^^,1- ] ' £i ( I . 



1-I>Y 



2 

where, k= l,2,...,r and r is the number ofL^ sets which is at most |E\E u UEo|. 
Consider the finite alphabet version of (10), that is 

D~(R)= inf YtiVi. (61) 

ve-*i(E):||v-/i||7v<Kii£ 

The solution of (61) is obtained from that of Problem II. 1, but with a reverse computation on 
the partition of E and the mass of the extremum measure on the partition moving in the opposite 
direction. Below, we give the main theorem. 

Theorem IV.8. The solution of the finite alphabet version of (61) is given by 

D (R) = £ ma xV*(E°) + W*(E ) + £ £(L k )v* (L k ) . (62) 

k=l 
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Moreover, the optimal probabilities are given by 



v*(E )^ £ v* = £>■ + «, (63a) 

v*(E°)4£vr=f£ A i i -«) , (63b) 

ielP V/eZ° / 



k 



+ 



v*(E*)4£ v ; = J>- «-£ £ tt ) ) , (63c) 



a = mm 



(63d) 



where, k = 1,2, ...,r and r w number ofL k sets which is at most |E\E°UEo|. 

Remark IV.9. 77ze statements of Theorems IV. 1, IV.6, IV.7, IV.8 are also valid for the countable 
alphabet case, because their derivations are not restricted to E being finite alphabet. It also 
holds for any £ G BC + (L) as seen in Section III. The extensions of Theorems IV. 1 -IV.8 to £ G 
L°°' + (E,^(E), v) can be shown as well; for example, D + (R) is given by 

D + (R) = 4iaxV*(E°) + W*(E ) + £ t(Z k )v*(Z k ), (64) 

k=l 

where the optimal probabilities are given by 

v*(E°) = Ai(E°) + a, (65a) 
v*(E ) = (Ai(E )-a) + , (65b) 

v*(E*)= ^(E fe )-fa-£ \ , (65c) 

a = min(*4-/x(E°)Y (65d) 

k is at most countable. We outline the main steps of the derivation. For any n G N, £ G 5C + (E) 
define £„ = £f\n {i.e., the minimum between £ and n), then £„ G BC + (L), and for any v G B/?(jU) 
we have 

sup [ £ n (x)dv(x) = ^-(sup£ n (x) - mf£ n (x)) + [ l n (x)v(dx). 
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For any v £ we obtain the inequality 

/ £(x)dv(x) = sup / £ n (x)v(dx) 

< sup sup / £ n (x)v(dx) 
neNveB fi (/x)^ 

= sup | ^ ( sup£„(x) - inf £ n {x) ) + / £ n (x)dil n {x) ) 

< sup <J^ ( sup^W - inf£„(x) ) 1 + / £{x)dii{x). 

n<EN I 2 VxeE -^E / J 



Hence, 



Similarly, we can show that 



sup [ £{x)dv{x) < ^ sup < sup£„(x) — inf \ + [ £(x)d^l{> 



sup I £(x)dv(x) > ^ sup | sup£„(x) - inf £ n (x) 1 + / ^(x)<iju(x). 



sup / £(x)Jv(x) = ^ sup I sup£„(x) — inf £ n {x) 1 + I £{x)d\i{x). 



Hence, 

veB s (M) 

Utilizing the fact that sup neN sup^ £ n = sup n ||^ n || 00) v ( IKIU, V = infAew sup xeAC £(x), N = {A e 
3S{L) : v(A) = 0}, and similarly for the infimum) we obtain the results. 

V. Relation of Total Variational Distance to Other Metrics 

In this section, we discuss relations of the total variational distance to other distance metrics. 
We also refer to some applications with distance metrics that can be substituted by the total 
variational distance metric. 

L\ Distance Uncertainty. Let o E ^i(E) be a fixed measure (as well as ji e ^i(E)). Define 
the Radon-Nykodym derivatives ty/= (p = ^ (densities with respect to a fixed o £ ^i(Z)). 



Then, 



\v-h\\tv = J \<p(x) - \jf(x)\a(dx) . 



Consider a subset of B^(ju) defined by B/f, CT (ju) = {vG Bj?(/x) : v << <7,/l << o} C B^(ju). 
Then, 

B^,(t(ju) = j<p G Li(a),<p > 0,o -a.s. : ^ |<p(x) - ^f(x)\a(dx) < i?j . 
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Thus, under the absolute continuity of measures the total variational distance reduces to L\ 
distance. Robustness via L\ distance uncertainty on the space of spectral densities is investigated 
in the context of Wiener- Kolmogorov theory in an estimation and decision framework in [12], 
[13]. The extremum problem described under (a) can be applied to abstract formulations of 
minimax control and estimation, when the nominal system and uncertainty set are described by 
spectral measures with respect to variational distance. 

Relative Entropy Uncertainty Model. [4] The relative entropy of V G ^#i(L) with respect to 
jl G ~#i(E) is a mapping H{-\-) : ^i(E) x i — > [0,°°] defined by 



*(v|M) = 



J z log(f)Jv, ifv«/i 
+oo, otherwise. 

It is well known that H(v\n) > 0,Vv,jit G J?\(L), while H(v\n) = <^> v = jl. Total variational 
distance is bounded above by relative entropy via Pinsker's inequality giving 

||v-ju||rv < y/2H(v\n), V 5J UG ~#i(E). (66) 

Given a known or nominal probability measure /i G j&\ (E) the uncertainty set based on rela- 
tive entropy is defined by A^(ju) = {v G ^#i(E) : H(v\n) < R}, where R G [0,°°). Clearly, the 
uncertainty set determined by the total variation distance djy, is larger than that determined by 
the relative entropy. In other words, for every r > 0, in view of Pinsker's inequality (66): 

v G ^i(E),v « n:H(v\n) < y j CB s (/i) = |v G ^i(L) : \\v-h\\tv < r|. 

Hence, even for those measures which satisfy v << /i, the uncertainty set described by relative 
entropy is a subset of the much larger total variation distance uncertainty set. Moreover, by 
Pinsker's inequality, distance in total variation of probability measures is a lower bound on 
their relative entropy or Kullback-Leibler distance, and hence convergence in relative entropy of 
probability measures implies their convergence in total variation distance. 

Over the last few years, relative entropy uncertainty model has received particular attention 
due to various properties (convexity, compact level sets), its simplicity and its connection to risk 
sensitive pay-off, minimax games, and large deviations [7]— [ 11]. Recently, an uncertainty model 
along the spirit of Radon-Nikodym derivative is employed in [ ] for portfolio optimization under 
uncertainty. Unfortunately, relative entropy uncertainty modeling has two disadvantages. 1) It 
does not define a true metric on the space of measures; 2) relative entropy between two measures 
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is not defined if the measures are not absolutely continuous. The latter rules out the possibility 
of measures v G M\ (E) and \x G ^i(E), Z C I to be defined on different spaces 4 . It is one of 
the main disadvantages in employing relative entropy in the context of uncertainty modelling for 
stochastic controlled diffusions (or SDE's) [28]. Specifically, by invoking a change of measure 
it can be shown that relative entropy modelling allows uncertainty in the drift coefficient of 
stochastic controlled diffusions, but not in the diffusion coefficient, because the latter kind of 
uncertainty leads to measures which are not absolutely continuous with respect to the nominal 
measure [7]. 

Kakutani-Hellinger Distance. [3] Another measure of distance of two probability measures which 
relates to their distance in variation is the Kakutani-Hellinger distance. Consider as before, 
v G }i G ^i(E) and a fixed measure o G ^i(E) such that v « o, /l « o and define 

*P = Zct' V = JEi- T ne Kakutani-Hellinger distance is a mapping cLkh '■ Li(o) x L\{o) >->■ [0,°°) 
defined by 

dKH{v^) = \j (VW)-VW))~ dc{x). (67) 

Indeed, the function dtai given by (67) is a metric on the set of probability measures. A related 
quantity is the Hellinger integral of measures v G jjt\ (E) and \x G (E) defined by 

H(v,n) = J y/<p(x)yr(x)do(x), (68) 

which is related to the Kakutani-Hellinger distance via dg H (v,fl) = 1 —H(v,ji). The relations 
between distance in variation and Kakutani-Hellinger distance (and Hellinger integral) are given 
by the following inequalities: 

Mllrv < ^{1 -f/(v,/i)}, (69) 

Ai||rv<2y / l-// 2 (v, i u), (70) 

H\\tv <VSd K H(v,n). (71) 

The above inequalities imply that these distances define the same topology on the space of 
probability measure on (E,^(E)). Specifically, convergence in total variation of probability 
measures defined on a metric space (E, ^(E),d), implies their weak convergence with respect 

4 This corresponds to the case in which the nominal system is a simplified version of the true system and is defined on a 
lower dimension space. 



2{l-#(v,/i)}<||v- 
||v — 

2dl H (v,n) <||v- 
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to the Kakutani-Hellinger distance metric, [3]. In [16], the Hellinger distance on the space of 
spectral densities is used to define a pay-off subject to constraints in the context of approximation 
theory. 

Levy-Prohorov Distance. [4] Given a metric space (E,^(E),d), and a family of probability 
measures ^i(E) on (E,^i(E)) it is possible to "metrize" weak convergence of probability 
measure, denoted by P n A P, where {P n : n E N} C ^#i(E), P E ^#i(E) via the so called Levy- 
Prohorov metric denoted by Jl/>(v,/i). Thus, this metric is also a candidate for a measure of 
proximity between two probability measures. The Levi-Prohorov metric is related to distance in 
variation via the upper bound [3], 

dLp{v,n) <min{||v-/x||7-v, l}, V v E M\ (E) , jU 6 Jt\ (E) . 

The function defined by L(v,/i) = max {dLp(v,n),dLp(n, v)}, is actually a distance metric (it 
satisfies the properties of distance). 

In view of the relations between different metrics, such as relative entropy, Levy-Prohorov 
metric, Kakutani-Hellinger metric, it is clear that the Problem discussed under (l)-(4) give sub- 
optimal solution to the same problem with distance in variation replaced by these metrics. 

VI. Examples 

We will illustrate through simple examples how the optimal solution of the different extremum 
problems behaves. In particular, we present calculations through Example VI-A for D + (R) and 
R + (D), when the sequence £ = {t\ £2 ... £ n } E W\. consists of a number of £,'s which are 
equal and calculations through Example VI-B for R(D) and D~(R) when the £ ; 's are not equal. 
We further present calculations through Example VI-C for D + (R), R + (D) and D (R), R (D) 
using a large number of ifs. 

A. Example A 

Let E = {/ : i = 1 , 2, . . . , 8} and for simplicity consider a descending sequence of lengths I = {£ E 
M+ : l\ = £2 > h = U > £5 > £(, = £1 > h} with corresponding nominal probability vector /1 E 
Pi(E). Specifically, let 1= [1,1,0.8,0.8,0.6,0.4,0.4,0.2], and [i = [g, ±§, ±§ ^, ^, ^, ^, ^] . 
Note that, the sets which correspond to the maximum, minimum and all the remaining lengths 
are equal to E° = {1,2},E = {8},Ei = {7,6},E 2 = {5},E 3 = {4,3}. Figures l(a)-(b) depicts 
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the maximum linear functional pay-off subject to total variational constraint, D + (R), and the 
optimal probabilities, both given by Theorem IV. 1. Figures l(c)-(d) depicts the maximum total 
variational pay-off subject to linear functional constraint, R + (D), and the optimal probabilities, 
both given by Theorem IV.7. Recall Lemma III. 1 case 1 and Corollary III. 3. Figure la shows 
that, D + (R) is a non-decreasing concave function of R and also that is constant in [-/? m ax,2], 
where 2? max = 2 (l -ju(L )) = 1. 




Re [0,2] 
(a) 



-Re [0,2] 
(b) 




0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.9 



(c) 




0.84 0.86 0.88 0.9 0.92 0.94 0.96 



(d) 



Fig. 1: Solution of Example A: (a) Optimum linear functional pay-off subject to total variational 
constraint, D + (R); (b) Optimal probabilities of D + (R); (c) Optimum total variational pay-off 
subject to linear functional constraint, R + (D); and, (d) Optimal probabilities of R + (D). 



B. Example B 

Let £ = {i : i = 1, 2, . . . , 8} and for simplicity consider a descending sequence of lengths £ = {£ e 
R\ : £\ > £2 > £3 > £4 > £5 > £e > ^1 > h} with corresponding nominal probability vector /1 G 
Pi(E). Specifically, let £= [1,0.8,0.7,0.6,0.5,0.4,0.3,0.2] and fi = [§,^,^,72,75,^,^,^]. 
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Note that, the sets which correspond to the maximum, minimum and all the remaining lengths are 
equal to E° = {1},E = {8},L l = {2},E 2 = {3},E 3 = {4},E 4 = {5},E 5 = {6},E 6 = {7}. Figures 
2(a)-(b) depicts the minimum total variational pay-off subject to linear functional constraint, 
R~(D), and the optimal probabilities, both given by Theorem IV.6. Figures 2(c)-(d) depicts the 
minimum linear functional pay-off subject to total variational constraint, D (R), and the optimal 
probabilities, both given by Theorem IV. 8. 





Be [0,2] Be [0,2] 

(c) (d) 

Fig. 2: Solution of Example B: (a) Optimum total variational pay-off subject to linear functional 
constraint, R (D); (b) Optimal probabilities of R (D); (c) Optimum linear functional pay-off 
subject to total variational constraint, D (R); and, (d) Optimal probabilities of D (R). 



Recall Lemma III. 1 case 2 and Corollary III. 3. Figure 2a shows that, R (D) is a non-increasing 
convex function of D, D E [^min,E/e£^'M!')- Note that for D < £ m [ n = 0.2 no solution exists and 
R '-(D) is zero in [ZW, 00 ) where D mSLX = £f =1 iifr = 0.73. 
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C. Example C 

Let E ={/:/= 1 , 2, ... , 50} and consider a descending sequence of lengths l={felf} with 
corresponding nominal probability vector ji e Pi(E). For display purposes the support sets are 
denoted by E* where x, y = {1,2, ... , 16}, though of course the subscript symbol x corresponds 
to the support sets of Problem D + (R) , R + (D) and the superscript symbol y corresponds to the 
support sets of Problem D (R) and R (D). Let 

£=[20 20 20 20 19 19 19 18 17 17 16 14 14 13 13 13 13 12 10 10 10 10 
10 99988888887765433333322221 1 
and 

ju = [0.052 0.002 0.01 0.006 0.004 0.038 0.032 0.028 0.026 0.008 0.012 0.01 0.008 

0.026 0.05 0.044 0.03 0.032 0.024 0.01 0.02 0.03 0.014 0.024 0.004 0.006 0.024 

0.01 0.022 0.012 0.016 0.042 0.014 0.016 0.01 0.024 0.02 0.008 0.014 0.032 0.018 

0.012 0.01 0.04 0.036 0.018 0.002 0.022 0.012 0.016 . 

Note that, the sets which correspond to the maximum, minimum and all the remaining lengths 
are equal to 

E° = {1 -4},E = {50,49}, E} 6 = {48 -45}, E^ 5 = {44- 39}, E^ 4 = {38}, E^ 3 = {37}, 

E* 2 = {36}, Ej 1 = {35,34}, E* = {33 -27}, E^ = {26 - 24}, E^ = {23 - 19},E? = {18}, 

£?! = {17 - 14},E^ 2 = {13, 12},E 4 3 = {11},^ = {10-9},E 2 5 = {8},E} 6 = {7-5}. 

Figures 3(a)-(b) depicts the maximum linear functional pay-off subject to total variational con- 
straint, D + (R), and the maximum total variational pay-off subject to linear functional constraint, 
R + (D), given by Theorem IV. 1, IV.7, respectively. Figures 3(c)-(d) depicts the minimum linear 
functional pay-off subject to total variational constraint, D (R), and the minimum total variational 
pay-off subject to linear functional constraint, R~(D), given by Theorem IV.8, IV.6 respectively. 

VII. Conclusion 

This paper is concerned with extremum problems involving total variational distance metric as 
a pay-off subject to linear functional constraints, and vice-versa; that is, with the roles of total 
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He [0,2] Of^EmW 

(C) (d) 

Fig. 3: Solution of Example C: (a) Optimum linear functional pay-off subject to total variational 
constraint, D + (R); (b) Optimum total variational pay-off subject to linear functional constraint, 
R + (D); (c) Optimum linear functional pay-off subject to total variational constraint, D (R); and, 
(d) Optimum total variational pay-off subject to linear functional constraint, R~(D). 

variational metric and linear functional interchanged. These problems are formulated using con- 
cepts from signed measures while the theory is developed on abstract spaces. Certain properties 
and applications of the extremum problems are discussed, while closed form expressions of the 
extremum measures are derived for finite alphabet spaces. Finally, it is shown through examples 
how the extremum solution of the various problems behaves. Extremum problems have a wide 
variety of applications, spanning from Markov decision problems to model reduction. 
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Appendix 
Proof of Theorem IV.6 

Lemma A.l. The following bounds hold. 

1. Lower Bound. 

I^ + >^ mi „(f). (72) 

ieE 

The bound holds with equality if 

I> + f<l, L^ ; + = f, $t = 0for iEL\L , 
and the optimal probability on Lq is given by 

v*(Eo)^ £v*=inin(l, + y V 

2. Upper Bound. 

E^r<w(f). (73) 

77ie bound holds with equality if 

and optimal probability on E° is g/ven Z?j 



v*(E )4£ v ;=f£^-y) 

ie£° \ielP / 



Proof: Follows from Section III-A. ■ 

Proposition A.2. If £; 6 £o jU ; - + § = 1 and v* > //,,- for all i e £o then R~ (D) = 2(1 - E;ez Mi)- 

Proo/: The condition Eielo /X; + § = 1 implies that L/ez V* = 1 and therefore L/ez\z V* = 0, 
hence v* = 0, for all i E £\£o- Then the minimum pay-off (52) is given by 

r~(d) = y, i y r-Mii+ E ivf —Mil = L iv*-Mii+ £ i— Mil 

ieZo ies\£ ! ' eS o ie£\£o 



(a) 



£vf-J> + E Mi= 1-lMi + 1-lMi =2 1-lMi). 



where (a) follows due to the fact that v* > /l ; for all i G Lo. 
Next, we show the derivation of (54). 
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Proof: Under the conditions stated, we have that 

E = t E t£r + E + E = E E 4w + ff-E E » 

Also, 

a 



2 



From (50), we have that 



o>£^ + -£^r+E^ 

/ez ieE iez 

=4d»fltt+fVE E ^-^)(y-E E *) + E 

Solving the above equation with respect to a we get that 

2 (d - £ mm £ - £ £ - £ £ tiM 

''ez 7=1 ieU- 1 j=ki€& 



a < 



^min £(Et ) 

If we select the solution at the boundary then, (76) is obtained. From (75c) we have that 
£^ r =(f-E E MM , and hence, £ v,- = £ & - ( | - £ £ /I,-). 

The optimal E,^^ V* must satisfy (75a). Hence, (77) is obtained. ■ 
Putting together Lemma A.l, Proposition A. 2, Lemma A. 3, and Corollary A.4 we obtain the 
result of Theorem IV.6. 
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